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Abstract 

Cloud providers sell their idle capacity on markets through an auction-like 
mechanism to increase their return on investment. The instances sold in this 
way are called spot instances. In spite that spot instances are usually 90% 
cheaper than on-demand instances, they can be terminated by provider when 
their bidding prices are lower than market prices. Thus, they are largely used 
to provision fault-tolerant applications only. In this paper, we explore how 
to utilize spot instances to provision web applications, which are usually 
considered availability-critical. The idea is to take advantage of differences 
in price among various types of spot instances to reach both high availability 
and signihcant cost saving. We hrst propose a fault-tolerant model for web 
applications provisioned by spot instances. Based on that, we devise novel 
cost-efficient auto-scaling polices that comply with the dehned fault-tolerant 
semantics for hourly billed cloud markets. We implemented the proposed 
model and policies both on a simulation testbed for repeatable validation 
and Amazon EC2. The experiments on the simulation testbed and EC2 
show that the proposed approach can greatly reduce resource cost and still 
achieve satisfactory Quality of Service (QoS) in terms of response time and 
availability. 

Keywords: Cloud Computing, Auto-scaling, Web Application, Fault 
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1. Introduction 


There are three common pricing models in current Infrastructure-as-a- 
service (laaS) cloud providers, namely on-demand^ in which acquired virtual 
machines (VMs) are charged periodically with hxed rates, reservation, where 
users pay an amount of up-front fee for each VM to secure availability of 
usage and cheaper price within a certain contract period, and the spot. 

The spot pricing model was introduced by Amazon to sell their spare 
capacity in open market through an auction-like mechanism. The provider 
dynamically sets the market price of each VM type according to real-time 
demand and supply. To participate in the market, a cloud user needs to 
give a bid specifying number of instances for the type of VM he wants to 
acquire and the maximum unit price he is willing to pay. If the bidding 
price exceeds the current market price, the bid is fulhlled. After getting the 
required spot VMs, the user only pays the current market prices no matter 
how much he actually bids, which results in signihcant cost saving compared 
to VMs billed in on-demand prices (usually only 10% to 20% of the latter) [1]. 
However, obtained spot VMs will be terminated by cloud provider whenever 
their market prices rise beyond the bidding prices. 

Such model is ideal for fault-tolerant and non-time-critical applications 
such as scientihc computing, big data analytics, and media processing ap¬ 
plications. On the other hand, it is generally believed that availability- and 
time-critical applications, like web applications, are not suitable to be de¬ 
ployed on spot instances. 

Adversely in this paper, we illustrate that, with effective fault-tolerant 
mechanism and carefully designed policies that comply with the fault-tolerant 
semantics, it is also possible to reliably scale web applications using spot 
instances to reach both high QoS and signihcant cost saving. 

Spot market is similar to a stock market that, though possibly following 
the general trends, each listed item has its distinctive market behaviour ac¬ 
cording to its own supply and demand. In this kind of market, often price 
differences appear with some types of instances sold in expensive prices due 
to high demand, while some remaining unfavoured leading to attractive deals. 
Figure [^depicts a period of Amazon EC2’s spot market history. Within this 
time frame, there were always some spot types sold in discounted prices. By 
exploiting the diversity in this market, cloud users can utilize spot instances 
as long as possible to further reduce their cost. Recently, Amazon introduced 
the Spot Fleet API [2], which allows users to bid for a pool of resources at 
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Figure 1: One week spot price history from March 2nd 2015 18:00:00 GMT 
in Amazon EC2’s us-east-ld Availability Zone 


once. The provision of resources is automatically managed by Amazon using 
combination of spot instances with lowest cost. However, it still lacks fault- 
tolerant capability to avoid availability and performance impact caused by 
sudden termination of spot instances, and thus, is not suitable to provision 
web applications. 

To £11 in this gap, we aim to build a solution to cater this need. We 
proposed a reliable auto-scaling system for web applications using heteroge¬ 
neous spot instances along with on-demand instances. Our approach not only 
greatly reduces financial cost of using cloud resources, but also ensures high 
availability and low response time, even when some types of spot VMs are 
terminated unexpectedly by cloud provider simultaneously or consecutively 
within a short period of time. 

The main contributions of this paper are: 

• a fault-tolerant model for web applications provisioned by spot in¬ 
stances; 

• cost-efficient auto-scaling policies that comply with the defined fault- 
tolerant semantics using heterogeneous spot instances; 

• event-driven prototype implementations of the proposed auto-scaling 
system on CloudSim [3] and Amazon EC2 platform; 

• performance evaluations through both repeatable simulation studies 
based on historical data and real experiments on Amazon EC2; 
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Figure 2: Proposed Auto-scaling system architecture 


The remainder of the paper is organized as follows. We hrst model our 
problem in Section In section we propose the base auto-scaling policies 
using heterogeneous spot instances under hourly billed context. Section 
explains the optimizations we proposed on the initial polices. Section 
briefly introduces our prototype implementations. We present and analyze 
the results of the performance evaluations in Section and discuss the related 
works in Sectionj^ Finally, we conclude the paper and vision our future work. 

2. System Model 

For reader’s convenience, the symbols used in this paper are listed in 
Table [H 

2.1. Auto-scaling System Architecture 

As illustrated in Figure our auto-scaling system provisions a single¬ 
tier (usually the application server tier) of an application using a mixture 
of on-demand instances and spot instances. The provisioned on-demand 
instances are homogeneous instances that are most cost-efficient regarding 
the application, whilst spot instances are heterogeneous. 

Like other auto-scaling systems, our system is composed of the monitoring 
module, the decision-making module, and the load balancer. The monitoring 
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Table 1: List of Symbols 


Symbol 

Meaning 

T 

The set of spot types 

^min 

The minimum allowed resource margin of an instance 

Mdef 

The default resource margin of an instance 

Q 

The quota for each spot group 

R 

The required resource capacity for the current load 

^ max 

The maximum allowed fault-tolerant level 

f 

The specihed fault-tolerant level 

0 

The minimum percentage of on-demand resources 
in the provision 

s 

The maximum number of selected spot groups 
in the provision 

To 

The resource capacity provisioned by on-demand 
instances 

s 

The number of chosen spot groups 

vm 

The VM type 

vruo 

The on-demand VM type 

^vm 

The hourly on-demand cost of the vm type instance 

num{c, vm) 

The function returns the number of vm type 
instances required to satisfy resource capacity c 

Co 

The hourly cost of provision in on-demand mode 

fh 

The truthful bidding price of vm spot group 

m 

The dynamic resource margin of an instance 
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Figure 3: Naive provisioning using spot instance 




module consists of multiple independent monitors that are responsible for 
fetching newest corresponding system information such as resource utiliza¬ 
tions, request rates, spot market prices, and VMs’ statuses into the system. 
The decision-making module then makes scaling decisions according to the 
obtained information based on the predehned strategies and policies when 
necessary. Since in our proposed system provisioned virtual cluster is het¬ 
erogeneous, the load balancer should be able to distribute requests according 
to the capability of each attached VM. The algorithm we use in this case is 
weighted round robin. 

The application hosted by the system should be stateless. This restriction 
does not reduce the applicability of the system as modern cloud applications 
are meant to de developed in a stateless way in order to realize high seal- 
ability and availability |1]. In addition, stateful applications can be easily 
transformed into stateless services using various means, e.g., storing the ses¬ 
sion data in a separated memcache cluster. 

2.2. Fault-Tolerant Mechanism 

Suppose there are sufficient temporal gaps between price variation events 
of various types of spot VMs, increasing spot heterogeneity in provision can 
improve robustness. As illustrated in Figure |3(a)| the application is fully 


^The red rectangles in Figure andj^stand for the minimum amount of capacity 
required to process the current workload. Its value is dynamic and proportional to the 
changing workload so as the amount of redundancy for fault-tolerance. 
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Figure 4: Provisioning for different fault-tolerant levels 


provisioned using 40 m3.medium spot VMs only, which may lead it to losing 
100% of its capacity when m3.medium''s market price go beyond the bidding 
price. By respectively provisioning 75% and 25% of the total required capac¬ 
ity using 30 m3.medium and ^m3. large spot VMs in Figure 3(b), it will lose 
at most 75% of its processing capacity when the price of either chosen type 
rises above the bidding price. Furthermore, if it is provisioned with equal 
capacity using the two types of spot VMs, like in Figure 3(c), termination of 


the either type of VMs will only cause it to lose 50% of its capacity. 

This is still unsatisfactory as we demand application performance to be 
intact even when unexpected termination happens. Simply, the solution is 
to further over-provision the same amount of capacity using another spot 
type, as the example illustrated in Figure 4(b), it can be 50% of the required 
capacity provisioned using 9 c3. large instances. In this way, the application is 
now able to tolerate the termination of any involving type of VMs and remain 
fully provisioned. After detection of the termination, the scaling system can 
either provision the application using another type of spot VMs or switch 
to on-demand instances. Application performance is unlikely to be affected 
if there is no other termination happens before the scaling operation that 
repairs the provision fully completes. 

However, it takes quite a long time to acquire and boot a VM (around 
2 minutes for on-demand instances and 12 minutes for spot instances 0 ). 


^According to Amazon’s specification, the capacity of 1 m3.large instance is equal to 
the capacity of 2 m3.medium instances. 
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Figure 5: Provisioning for different fault-tolerant levels using 2 more spot 
types 


Hence, there is substantial possibility that another type of spot VMs could 
be terminated within this time window. To counter such situation, it requires 
further over-provision the application using extra spot types. We dehne the 
fault-tolerant level of our auto-scaling system as the maximum number of 
spot types that can be unexpectedly terminated without affecting application 
performance before its provision can be fully recovered. Figure [^respectively 
shows the provision examples that comply with fault-tolerant level zero, one, 
two, and three in our dehnition with each spot type provisioning 50% of the 
required capacity. 

Note that setting fault-tolerant level to zero is usually not recommended. 
Though using multiple types of spot instances conhnes amount of resource 
loss when failures happen, with no over-provision to compensate resource 
loss, it may frequently cause performance degradations as failure probability 
becomes higher when more types of spot instances are involved. 


2.3. Reliability and Cost Efficiency 

Though the provisions shown in Figure 4(b), 4(c), and 4(d)| successfully 
increase reliability of the application, they are not cost-efficient. The three 
provisions respectively over-provision 50%, 100%, and 150% of resources re¬ 
quired by the application, which greatly diminishes the cost saving of using 
spot instances. 

One possible improvement is to provision the application using more num¬ 
ber of spot types. The illustrative provisions in Figure employ two more 
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Figure 6: Provisioning for different fault-tolerant levels using mixture of on- 
demand and spot instances 


spot types than that are used in Figure]^ to reach the corresponding fault- 
tolerant levels. As the result, total over-provisioned capacities for the three 
cases are reduced to 25%, 50%, and 75%. Though the provisions now might 
become more volatile with more types of spot VMs involved, the increased 
risk is manageable by the fault-tolerant mechanism with over-provision. 

To reduce over-provision, the other choice is to provision the applica¬ 
tion with a mixture of on-demand instances and spot instances. Like the 
demonstrations shown in Figure there are now only 20%, 40%, and 60% 
over-provisioned capacities if 20% of the required resource capacity is provi¬ 
sioned by on-demand instances. Moreover, using on-demand resources also 
further confines amount of capacity that could be lost unexpectedly, thus, 
improving robustness. On the other hand, this method incurs more financial 
cost. 

We define total capacity that is provisioned by the same type of spot VMs 
as a Spot Group. In addition to that, we give definition to Quota (Q), 
which is the capacity each spot group needs to provision given the capacity 
provisioned by on-demand resources (ro) and the fault-tolerant level (/). It 
is calculated as: 


Q = f77 (1) 

where R represents the required capacity for the current load, and s denotes 
the number of chosen spot types. The minimum amount of capacity that is 


9 










































































required to over-provision then can be calculated as Q * f. 

We call a provision is safe if the provisioned capacity of each spot group 
is larger than Q. Hence, the problem of scaling web applications using hetero¬ 
geneous spot VMs is transformed to dynamically selecting spot VM types and 
provisioning corresponding spot and on-demand VMs to keep the provision 
in safe state with minimum cost when the application workload increases, 
and timely deprovisioning various types of VMs when they are no longer 
needed. 


3. Scaling Policies 

Based on the previous fault-tolerant model, we propose cost-efficient auto- 
scaling policies that comply with the defined fault-tolerant semantics for 
hourly billed cloud market like Amazon EC2. 

3.1. Capacity Estimation and Load Balancing 

Our auto-scaling system is aware of multiple resource dimensions (such 
as CPU, Memory, Network, and Disk I/O). It needs the profile of the target 
application regarding its average resource consumption for all the considered 
dimensions. Currently, the profiling needs to be performed offline, but our 
approach is open to integrate dynamic online profiling into it. 

With the profile, the system is able to estimate the processing capability 
of each spot type under the context of the scaling application. Based on that, 
it can easily determine how to distribute incoming requests to the heteroge¬ 
neous VMs to balance their loads. In addistion, the estimated capabilities 
are used in the calculation of scaling plans as well. 


3.2. Spot Mode and On-Demand Mode 

Our scaling system runs interchangeably in Spot Mode and On-Demand 
Mode. Spot Mode provisions application in the way explained in Section 
In Spot Mode, user needs to specify the minimum percentage of required 


2.3 


resources provisioned by on-demand instances, symbolized as O. He can also 
set a limit on the number of selected spot groups in provision, denoted as S. 
To define these parameters, users can utilize the simulation tool implemented 
by us (described in Section to find the optimal configurations according 
to the recent spot market history without running real tests on the cloud. 
Furthermore, these parameters can be dynamically adjusted using machine 
learning technologies. We leave this as our future work. In On-Demand 
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Mode, application is fully provisioned by on-demand instances without over¬ 
provision. Switches between modes are dynamically triggered by the scaling 
policies detailed in the following sections. 

3.3. Truthful Bidding Prices 

Bidding truthfully means the participant in an auction always bids the 
maximum price he is willing to pay. In order to guarantee cost-efficiency, 
truthful bidding price for each VM type in our policies is calculated dynami¬ 
cally according to real-time workload and provision. Before computing them, 
we hrst calculate the hourly baseline cost if the application is provisioned in 
On-Demand Mode, which can be represented as: 


Co = num{R, vrUo) * (2) 

where function num{R,vmo) returns the minimum number of instances of 
on-demand VM type required to process the current workload, is the 
on-demand hourly price of on-demand instance type. Then truthful bidding 
price of spot type vm is derived as follow: 

Co - num{ro, virio) * c^rno 

tb^m N (3) 

s * num[Q^ vm) 

where num{ro, vrUo) and num{Q, vm) are interpreted similarly to num{R, vmo) 
in Equation ([^. 

This ensures that even in the worst situation that all chosen spot types’ 
market prices are equal to their corresponding truthful bidding prices, the 
total hourly cost of the provision will not exceed that in On-Demand Mode. 


3 . 4 . Scaling Up Policy 

Scaling up policy is called when some instances are terminated unex¬ 
pectedly or the current provision cannot satisfy resource requirement of the 
application. By resource requirement, in Spot Mode, it means the provision 


should be safe under the current workload, which is dehned in Section 2.3 


While in On-Demand Mode, it only requires the resource capacity of the 
provision to exceed the resource needs of the current workload. 

Algorithm is used to hnd the ideal new provision when the system 
needs to scale up. To avoid frequent drastic changes, the algorithm only 
provisions VMs incrementally. As shown by line [T] in Algorithm it limits 
the number of provisioned on-demand instances to be at least its current 
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Algorithm 1: Find new provision when the system needs to scale up 
Input; R : the current workload 

Input: ric : the number of on-demand VMs in current provision 
Input: vrUo '■ the on demand vm type 

Input; O : the minimum percentage of on-demand resources 
Output: target-provision 

1 min^vruo ^ max(nc, num{R * O, vrUo))', 

2 max^vruo num{R,vmo); 

3 candidate set •(— call Algorithm for each integer n in 
[mincvnio, maxsmo]] 

4 return on-demand provision if candidateset is empty 

5 otherwise the provision with minimum cost in candidateset'^ 


number. For each valid number of on-demand instances, it calls Algorithm 
1^ to find the corresponding best provision among provisions with various 
combinations of spot groups. Similarly, in Algorithm]^ (line [IT| , it retains 
the spot groups chosen by the current provision and only incrementally adds 
new groups according to their cost-efficiency (line (Isj). If there is no valid 
provision found, the system switches to on-demand mode. 

After the target provision is found, the system compares it with the cur¬ 
rent provision and then contacts the cloud provider through its API to pro¬ 
vision the corresponding types of VMs that are in short. 

In the worst case, the time complexity of the scaling up policy is 0{N * 
S'* |T|)) where N is the number of on demand instances required to provision 
the current workload in on demand mode, S denotes the maximum number of 
chosen spot groups, and |T| is the number of spot types considered. Since the 
parameters are all small integers, the computation overhead of the algorithm 
is acceptable in an online decision making scenario. 

3.5. Scaling Down Policy 

Since each instance is billed hourly, it is unwise to shut down one instance 
before its current billing hour matures. We therefore put the decision of 
whether each instance should be terminated or not at the end of their billing 
hours. The specific decision algorithms are different for on-demand instances 
and spot instances. 
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Algorithm 2: Find provision given the number of on-demand instances 
Input; n : the number of on-demand VMs 
Input; Qc ; the set of spot groups in current provision 
Input; vrUo '■ the on-demand vm type 
Input; / ; the fault-tolerant level 
Input; T ; the set of spot types 

Input; S : the maximum number of chosen spot groups 
Output; newjprovision 

1 min^groups <(— maxd^fd, / + 1); 

2 max-groups min(|T|, S); 

3 if max-groups < min_groups then 

4 I provision not found; 

5 end 

6 else 


7 

8 
9 

10 

11 

12 


for s from miu-groups to max-groups do 
p ^ pU {vmo, n); 
compute Q using Equation ([^; 
compute tbvrn for each vm in T; 
p^pUgc] 

groups ^ each group not in go and whose tbm 
market price; 

13 A; s — \gc\; 

14 if \groups\ > k then 

15 p p U top k cheapest groups in groups; 

16 provisions provisions U p; 

17 end 

18 end 

19 end 

20 return the cheapest provision in provisions; 


is higher than 
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Algorithm 3: Find target provision when the billing hour of one on- 
demand instance is about to end 


Input 


R : the current workload 


Input; He : the number of on-demand instances in current provision 
Input; vrrto ; the on-demand vm type 

Input; O ; the minimum percentage of on-demand resources 
Output; targetjprovision 
1 if Uc < num{R * O, vrrio) then 


provision not found; 


3 end 

4 else 


Pi ^ call Algorithm 
P 2 ^ call Algorithm 


with Ur 


with ric — 1; 

return on-demand provision if neither pi nor p 2 is found otherwise 
either provision that is cheaper; 


8 end 


3.5.1. Policy for on-demand instances 

When one on-demand instance is at the end of its billing hour, we not 
only need to decide whether the instance should be shut down, but also have 
to make changes to the spot groups if necessary. The summarized policy 
is abstracted in Algorithm The algorithm first checks whether enough 
on-demand instances are provisioned to satisfy the on-demand capacity limit 
(line and hne|^. If there are sufficient on-demand instances, it endeavours 
to hnd the most cost-efficient provisions with and without the on-demand 
instance by calling Algorithm (line and line [^. Suppose the current 
provision is in On-Demand Mode and no provision is found without the on- 
demand instance, the provision will remain in On-Demand Mode. Otherwise, 
if a new provision is found without the current instance, the policy switches 
the provision to Spot Mode. In the case that the current provision is already 
in Spot Mode, it picks whichever provision that incurs lower hourly cost. 

3.5.2. Policy for spot instances 

When dealing with a spot instance whose billing period is ending, in the 
base policy, we simply shut down the instance when the corresponding spot 
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quota Q can be satisfied without it. Thereafter, the policy will evolve with 
the introduced optimizations in Section 

3.6. Spot Groups Removal Policy 

Note that in both scaling up and down policies, we forbid removing se¬ 
lected spot groups from provision. Instead, we evict a chosen spot group 
when any spot instances of such type is terminated by the provider. Since 
bidding price of each instance is calculated dynamically, instances within the 
same spot group may be bid at different prices. This could cause some in¬ 
stances to remain alive even after the corresponding spot groups are removed 
from provision. We call the instances that are running but do not belong to 
any group orphans. Though orphan instances are still in production, they 
are not considered a part of the provision according to the fault-tolerant se¬ 
mantics when making scaling decisions. In the base policies, although they 
will not be shut down until their billing hour ends, extra instances still need 
to be launched to comply with the fault-tolerant semantics, which causes 
resource waste. This drawback is addressed by the introduced optimizations 
in the following section. 

4. Optimizations 

We have made several optimizations on the above proposed base policies 
to further improve cost-efficiency and reliability of the system. 

4 . 1 . Bidding Strategy 

In the scaling policies, spot groups are bid at truthful bidding prices 
calculated by Equation (|^ due to cost-efficiency concern. While focusing on 
robustness, the system can employ a different strategy to bid higher so as to 
grasp spot instances as long as possible. 

4 . 1 . 1 . Actual Bidding Strategies 

There are two actual bidding strategies, namely truthful bidding strategy 
and on-demand price bidding strategy embedded in the system. 

• Truthful Bidding Strategy: the system always bids the truthful 
bidding price calculated by Equation ^ when new spot instances are 
launched. Since partial billing hours ended by cloud provider are free of 
charge, cloud users can save money by letting cloud provider terminate 
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their spot instances once their market prices exceed the corresponding 
trnthfnl bidding prices. On the other hand, it leads to more nnexpected 
terminations. 

• On-Demand Price Bidding Strategy: the system always bids the 
on-demand price of the corresponding spot type whenever trying to 
obtain new spot instances. This strategy will cost clond nsers more 
money bnt provides a higher level of protection against nnexpected 
terminations. 


4-1-2. Revised Spot Groups Removal Policy 

In the base policies, less cost-efficient spot gronps could remain in provi¬ 
sion for a long time unless some of their instances are terminated by provider. 
When the actual bids are higher than the truthful bidding prices, the situ¬ 
ation could become worse. Instead of just relying on provider terminating 
uneconomical spot groups, the revised policy actively inspects whether mar¬ 
ket prices of some spot groups have exceeded their corresponding truthful 
bidding prices and remove them from the provision. In the meantime, for 
spot groups whose market prices are still below their truthful bidding prices, 
it looks for chance to replace them by more economical spot groups that 
have not been selected. To minimize disturbance to provision, such oper¬ 
ations should be conducted in a long interval, such as every 30 minutes in 
our implementation. Members of removed or replaced spot groups become 
orphans. 


4-2. Utilizing Orphans 

After removing or replacing some spot groups, if the system simply lets 
members of these spot groups become orphans and immediately start in¬ 
stances of newly chosen spot groups, the stability of provision will be af¬ 
fected. Furthermore, as orphans are not considered as valid capacity in the 
base polices, during the transition period, it has to provision more resources 
than necessary, which results in monetary waste. 

To alleviate this problem, we aim to utilize as many orphans in provision 
as possible to deter the time to provision new VMs. As a result, resource 
waste can be reduced and cost-efficiency is improved. 

We modify the proposed fault-tolerant model to allow a spot group tem¬ 
porarily accept instances that are heterogeneous to the spot group type under 
certain conditions. Figure illustrates such provision. In Figure 7(a), the 
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ml.small group does not have sufficient instances to satisfy its quota. Instead 
of launching 2 new ml.small spot instances, the policy now temporarily move 
the available orphan, one ml.medium instance, to ml.small group to com¬ 
pensate the dehciency of its quota. Even though ml.small group becomes 
heterogeneous in this case, it does not violate the fault-tolerant semantics 
as losing any type of spot instances will not influence the application per¬ 
formance. However, in some situations, heterogeneity in spot groups could 
cause violation of the fault-tolerant semantics, for example, there might be 
case that three ml .medium orphans are spread across three spot groups and 
the total capacity of the three instances exceeds the spot quota. Then los¬ 
ing the three ml.medium instances will violate the fault-tolerant semantics. 
Fortunately, such cases are very rare as orphans are usually small in numbers 
and are expected to be shut down in a short time. 

With this relaxation of the fault-tolerant model, the previous scaling up 
and scaling down policies need to be revised to efficiently utilize capacity of 
orphans. 

4.2.1. Revised Scaling Up Policy 

The new scaling up policy uses the same algorithm (Algorithm to 
hnd the target provision. However, instead of simply launching instances to 
reach the target provision, the new policies take a deeper thought whether 
it can utilize existing orphans to meet the quota requirements in the target 
provision. 

The new policy hrst checks whether the target provision chooses new 
spot groups. If there are orphans whose types are the same to any newly 
chosen groups, lying either within orphan queue or other spot groups, they 
are immediately moved to the corresponding new spot groups. After that, the 
policies endeavour to insert non-utilized orphans from the orphan queue into 
spot groups that have not met their quota requirement. If all the orphans 
have been utilized and some groups still cannot satisfy their quota, new spot 
instances of the corresponding types then will be launched. 

4 . 2 . 2 . Revised Scaling Down Policy 

Regarding policy for on-demand instances that are close to their billing 
hour, the new policy utilizes the same mechanism in the revised scaling up 
policy to provision any changes between the current provision and the target 
provision. 
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For spot scaling down policy, if the spot instance is in orphan queue, it 
is immediately shut down. Suppose it is within the spot group of the same 
type, it is shut down when the spot quota can be satished without it. In the 
case that the instance is an orphan within other spot group, the new policy 
shuts down the instance and in the meantime starts certain number of spot 
instances of the spot group type to compensate the capacity loss. 

4.3. Reducing Resource Margin 

For applications running on traditional auto-scaling platform, administra¬ 
tor usually leaves a margin at each instance to handle short-term workload 
surge in order to buy time for booting up new instances. This margin em¬ 
pirically ranges from 20 to 25% of the instance’s capacity. 

With over-provision already in place in our system, this margin can be re¬ 
duced under Spot Mode provision. We devise a mechanism that dynamically 
changes the margin according to the current fault-tolerant level. Since higher 
fault-tolerant level leads to more over-provision, we can be more aggressive 
in reducing the margin of each instance. In detail, the dynamic margin is 
determined by the formula: 


m = 


M, 


def 


M„ 




* / + 


( 4 ) 


where Mmin means the minimum allowed margin, e.g., 10%, Mdef is the 
default margin used without dynamic margin reduction, e.g., 25%, and Fmax 
is the maximum allowed fault-tolerant level. 


5. Implementation 

We implemented a prototype of the proposed auto-scaling system on 
Amazon EC2 platform using Java, the components of which are illustrated in 
Figure It employs an event-driven architecture with the monitoring mod¬ 
ules continuously generating events according to newly obtained information, 
and the central processor consuming events one by one. Monitoring mod¬ 
ules produce and insert corresponding events with various critical levels into 
the central priority event queue. They include the resource utilization moni¬ 
tors that watch all dimensions of resource consumption of running instances, 
the billing monitor that gazes billing hour of each requested VM, the VM 
status monitor that reminds the system when some instances are online or 


19 





Event Handlers 


Scale Up Event 
Handler 


Billing Hour Ending 
Event Handler 


Instances Online 
Event Handler 


Utilization Update 
Event Handler 





Scaling Event 
Handler 


Spot Termination 
Event Handler 


Spot Price Update 
Event Handler 


Instances Impaired 
Event Handler 


v_y 


Figure 8: Components of the Implemented Auto-scaling System 


offline, the spot price monitor that records newest spot market prices for each 
considered spot type, and the spot request monitor that surveillances any un¬ 
expected spot termination. On the other side, the central event processor 
fetches events from the event queue and assigns them to the corresponding 
event handlers that realize the proposed policies to make scaling decisions or 
perform scaling actions. 

The prototype implementation provides a general interface for users to 
plug different load balancer solutions into the auto-scaling system. In our 
case, we use HAProxy with weighted round robin algorithm. It also offers 
the interface to allow users to automatically customize conhgurations of VMs 
according to their own available resources after they have been booted. 

For quick concept validation and repeatable evaluation of the proposed 
auto-scaling policies, we created a simulation version of the system. The 
same code base is transplanted onto CloudSim |3] toolkit which provides the 
underlying simulated cloud environment. Assuming bids from user impose 
negligible influence on market prices, the simulation tool is able to provide 
quick and economical validation of the proposed polices using historical data 
of the application and the spot market as input. 

For more details about the implementation, please refer to the released 

cod^B 


^https://github.com/quchenhao/spot-auto-scaling 
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Figure 9: The English Wikipedia workload from Sep 19th 2009 to Sep 26th 
2009 

6. Performance Evaluation 

6 .1. Simulation Experiments 

As stated in Section to allow repeatable evaluation, we developed a 
simulation version of the system that allows us to compare the performances 
of different conhgurations and policies using traces from real applications and 
spot markets. 

6.1.1. Simulation Settings 

We use one week trace of 10% English Wikipedia requests from Sep 19th 
2007 to Sep 26th 2007 as the workload P [7j, which is depicted in Figure 
1^ Note that our approach is general purpose and can be applied to any 
workload, as the proposed system does not make assumptions on the work¬ 
load and is fully reactive. We adopt the Wikipedia workload in experiments 
because it reveals signihcant variations that can trigger frequent scaling op¬ 
erations to let us observe the behaviour of our system. We believe one week 
trace is enough for the purpose of our experiments, as it gives the system 
ample opportunities to exercise the scaling policies. In addition, as reported 
by Eldin et ah [8], the Wikipedia workload revealed strong weekly pattern 
with only gradual changes in amplitude, level, and shapes. 

We consider 13 spot types in Amazon EC2. Their spot prices are simu¬ 
lated according to one week Amazon’s spot prices history from March 2nd 
2015 18:00:00 GMT in the relatively busy us-east region. The involving 
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spot types and their corresponding history market prices are illustrated in 
Figure [1} 

We set requests timeout at 30 seconds. In addition, we respectively 
set minimum allowed resource margin (M^in) and default resource margin 
[Mdef) at 10% and 25%. We found out that c3.large instance is the most 
cost-efficient type for the wikipedia application based on a small scale re¬ 
source prohling test of the Wikibench application [9] on Amazon EC2 and 
the resource specihcations of each instance type released by Amazon. It is 
selected to provision all the on-demand resources in the experiments. All 
simulation experiments start with 5 c3. large on-demand instances. Length 
of simulated requests are generated following a pseudo Gaussian distribu- 
tionElwith mean of 0.07 EClJ0and standard deviation of 0.005 ECU so that 
different tests using the same random seed are receiving exactly the same 
workload. The VM start up, shut down, and spot requesting delays are 
generated in the same way using pseudo Gaussian distribution. The means 
of the above three distributions are respectively 100, 100, 550 seconds, and 
the standard deviations are set at 20, 20, 50 seconds. The test results are 
deterministic and repeatable on the same machine. 

We tested our scaling policies with various fault-tolerant levels and differ¬ 
ent least amounts of on-demand resources, which are represented respectively 
as “/ — a:” and “?/% on-demand” in the results. We also tested the polices 
using the two embedded bidding strategies and static/dynamic resource mar¬ 
gins. 

We concentrate on two metrics, real-time response time of requests (av¬ 
erage response time per second reported) and total cost of instances, in all 
the experiments. 

6.1.2. Benchmarks 

We compare our scaling policies with two benchmarks: 

• On-Demand Auto-scaling: This benchmark only utilizes on-demand 


"‘Since Wikipedia is serving mostly the same type of requests - page view, the time 
taken to process each request is also likely to fall in a certain interval. To coarsely model 
such behaviour, we utilize Gaussian distribution. Other distributions with small head and 
tail can serve the same purpose as well. 

^It means the request takes 70ms to finish if it is computed by the VM equipped with 
vCPU as powerful as 1 Elastic Computing Unit (ECU) 
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Figure 10: Response time for on-demand auto-scaling 


instances. It is implemented by restricting the auto-scaling system al¬ 
ways in On-Demand Mode. 

• One Spot Type Auto-scaling: The auto-scaling policies used in 
this benchmark, like the proposed policies, provision a mixture of on- 
demand resources and spot resources. The benchmark also has a limit 
on minimum amount of on-demand resources provisioned. However, 
for spot instances, it only provisions one spot group that is the most 
cost-efficient at the moment without over-provision. If the provisioned 
spot instances are terminated, a new spot group then is selected and 
provisioned. Suppose a more economic spot group is found, the old 
spot group is gradually replaced by the new one. It is implemented by 
setting fault-tolerant level to zero and limiting at most one spot group 
can be provisioned. 


6.1.3. Response time 

Figure mini HI and respectively depict real-time average response 
time of requests using on-demand, one spot, and our approach with truth¬ 
ful bidding strategy and dynamic resource margin. From the results, the 
on-demand auto-scaling produced smooth response time all along the exper¬ 
imental duration except for a peak that was caused by the corresponding 
peak in the workload. All experiments employing one spot type auto-scaling 
experienced periods of request timeouts caused by termination of spot in¬ 
stances, and only increasing the amount of on-demand resources cloud not 
improve the situation. While our approach greatly reduced such unavailabil¬ 
ity of service even using / — 0 with no over-provision of resources. By using 
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Figure 11: Response time of one spot type auto-scaling with various percent¬ 
age of on-demand resources and truthful bidding strategy 
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Figure 12: Response time of / — 0 with various percentage of on-demand 
resources, truthful bidding strategy, and dynamic resource margin 
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Figure 13: Response time of / — 1 with various percentage of on-demand 
resources, truthful bidding strategy, and dynamic resource margin 
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Figure 14: Response time of one spot type auto-scaling with various percent¬ 
age of on-demand resources and on-demand bidding strategy 
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/ — 1, we were able to completely eliminate the timeouts under the recorded 
spot market traces. We omit the results for tests using / — 2 and / — 3 as 
they reveal similar results as Figure [T^ 

To show the effect of different bidding strategies, we compare the response 
time results of one spot type auto-scaling using the two proposed bidding 
strategies as they reveal the most signihcant difference. As Figure 11 and 
Figure [T4| present, it is obvious that service availability can be much improved 
with higher bidding prices using one spot type auto-scaling. On the other 
hand, the remaining timeouts also indicate that increasing bidding prices 
alone is not enough to guarantee high availability. 


6.1.4- Cost 

Table lists the total costs produced by all the experiments. Comparing 
to the cost of on-demand auto-scaling, we managed to gain signihcant cost 
saving using all other conhgurations. Tests using one spot type auto-scaling 
with 0% on-demand resources realized the most cost saving up to 80.87% 
regardless of its availability issue. 

The results show the amount of on-demand resources has a signihcant 
inhuence on cost saving. It also can be noted that higher fault-tolerant 
level incurs extra cost. Though optimal conhguration of fault-tolerant level 
is always application specihc, according to our results, conhguration using 
/ — 1 with 0% on-demand resource is the best choice under current market 
situation in regards of both hnancial cost and service availability. 

The resulted cost diherences caused by diherent bidding strategies are 
generally small. Therefore, it is better to bid higher to improve availability 
if user’s bidding has negligible impact on the market price. 

As dynamic resource margin is only applicable when application is over¬ 
provisioned, we give the results for tests using dynamic resource margin when 
fault-tolerant level is higher than zero. According to the results, dynamic 
resource margin can bring extra cost saving and the amount of cost saving 
increases when more over-provision is necessary (i.e., higher fault-tolerant 
level). Though the resulted cost saving is not signihcant, it is safely achieved 
without sacrihcing availability and performance of the application. 


6.2. Real Experiments 

We conducted two real tests on Amazon EC2 respectively using on- 
demand auto-scaling policies and the proposed auto-scaling policies with 
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Figure 15: The Testbed Architecture 


conhguration of / — 1 and 0% on-demand. Other parameters are dehned 
the same to the simulation tests. 

We set up the experimental environment to run the Wikibench [9] bench¬ 
mark tool. The major advantage of this tool compared to other tools such as 
TPC-W, RUBiS, and CloudStone is that it is stateless, which is the charac¬ 
teristic of modern highly scalable cloud services The tool is composed 
of three components: 

• a client driver that mimics clients by continuously sending requests to 
the application server according to the workload trace; 

• a stateless application server installed with the Mediawiki application; 

• a mysql database loaded with the English Wikipedia data by the date 
of Jan 3rd, 2008. 

Our aim is to scale the application-tier. Thus, we inserted a HAProxy load 
balancer layer into the original architecture in order to let the client driver 
talk to a cluster of servers. The architecture of the testbed is illustrated in 
Figure [Isj We picked the hrst 3 days of the Wikipedia workload jHlEj (Figure 
1^ and scaled it down to half of its original rate as the workload for testing 
because Amazon limits the number of instances each account can launch. 

The testing environment resided in Amazon us-east-Id zone which is 
in a relatively busy region with higher degree and frequency of price fluctua¬ 
tions. Regarding each component, we launched one c4-large instance acting 
as the client driver, one m3.medium instance running the HAProxy load bal- 






Figure 17: Response time for spot auto-scaling on Amazon 


ancer, and one instance serving the mysql database requests. The 

auto-scaling system itself is running on a local desktop computer remotely in 
Melbourne. Before the tests, we prohled each component to make sure none 
of them become the bottleneck of the system. 

The test using the proposed approach started at 3:30am September 9, 
2015, Wednesday, US east time. Its testing period spanned across three busy 
weekdays from Wednesday to Friday. 

Figure [TO] and pT| presents real-time response time results of the two exper¬ 
iments. Both results suffer from peaks of high response time. By studying the 
recorded log, we confirmed they were not caused by shortage of resources as 
resource utilizations of all the involving VMs were never beyond safe thresh- 


®The 4th generation instances were introduced between the time we performed the 
simulations and the real experiments. To be consistent, we only consider the 13 spot 
types listed in Figure for both the simulations and the real experiments 
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Table 3: Cost of the Experiments 



Cost(USD$) 

on-demand 

19.01 

ft — 1 and 0% on-demand 

5.69 


old during both tests. Various other reasons can be the culprits, such as cold 
cache, short term network issues, interference from the shared virtualized 
environment, and garbage collection m- We encountered three unexpected 
terminations during the test of our approach. Thanks to the fault-tolerant 
mechanism and policies, we managed to avoid service interruption and per¬ 
formance degradation during those periods. In addition, because resources 
are tighter in on-demand auto-scaling, it generally performs worse in response 
time compared to the proposed approach. 

Regarding cost, we calculated the total cost of application servers in both 
experiments. Table presents the results. The proposed approach reaches 
70.07% cost saving. 

6.3. Discussion 

Even with high fault-tolerant level, the proposed approach cannot guar¬ 
antee 100% availability, and no solution can ever manage to assure absolute 
service continuity due to the nature of spot market. What our system of¬ 
fers is a best effort to counter large scale surges of market prices of the 
selected spot types in a short time, which is highly unlikely under current 
market condition. In fact, we have not encountered any case that more than 
one spot group fail simultaneously during simulations, real experiments, and 
testing phases. However, market condition could change. Hence, application 
provider should adjust conhguration of the auto-scaling system dynamically 
according to real-time volatility of the spot market. In addition, the nature 
of the application also affects the decision. If the application is availability- 
critical, higher fault-tolerant level is always desirable. Adversely, for some 
applications, such as analytical jobs, even one spot type auto-scaling is ac¬ 
ceptable. 

The presented results in Section]^ only indicates the cost saving potential 
of a certain application considering a selected set of spot types under the 
recorded spot market prices and workload traces. Thanks to the dynamic 
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truthful bidding price mechanism, even in competitive market condition, we 
can ensure that the cost reduction gained by our approach will not vanish 
but only diminish. To reach more cost saving, the application provider can 
take into account a broader set of spot types, which is available in Amazon’s 
offering. 

To save cost and time for testing, application providers can tune the 
parameters of the auto-scaling system in a similar way as we did by hrst 
utilizing simulation for fast validation and then test the system in production 
environment. 

There are also differences in price among the same spot types across dif¬ 
ferent availability zones. It is trivial to extend the current fault-tolerant 
model to utilize spot groups from multiple availability zones. Currently, 
the auto-scaling system limits the selection of spot groups within the same 
availability zone due to charges for traffic across availability zones. If the ap¬ 
plication provider has already adopted a multi-availability-zone deployment, 
such extension is able to realize more cost saving. 

The overhead of the auto-scaling system is negligible. As presented in 
Section the time complexity of the scaling policies is not significant. The 
frequency that the scaling policies are called depends on the monitoring in¬ 
terval and the frequency of price changes, which are at least in the scale of 
seconds. 

7. Related Work 

7.1. Horizontally Auto-scaling Web Applications 

Horizontally auto-scaling web applications have been extensively studied 
and applied [12]. Basically, auto-scaling techniques for web applications can 
be classihed into three categories: reactive approaches, proactive approaches, 
and mixed approaches. Reactive approaches scale applications in accordance 
of workload changes. Proactive approaches predicts future workload and 
scale applications in advance. Mixed approaches can scale applications both 
reactively and proactively. 

Most industry auto-scaling systems are reactive-based. Among them, 
the most frequently used service is Amazon’s Auto Scaling Service [T3|. It 
requires user to hrst create an auto-scaling group, which specihes the type 
of VMs and image to use when launching new instances. Then user should 
dehne his scaling policies as rules like “add 2 instances when CPU utilization 
is larger than 75%”. Another popular service is offered by RightScale. Their 
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service is based on a voting mechanism that lets each running instance decide 
whether it is necessary to grow or shrink the size of the cluster based on their 
own condition na. 

Other than just using simple rules to make scaling decisions, researchers 
have developed scaling systems based on formal models. These models aim to 
answer the question that how many resources are actually required to serve 
certain amount of incoming workload under QoS constraints. Such model can 
be simply obtained using prohling techniques as we did in this paper. Other 
commonly adopted approaches include queueing models [ISllISllITKIHlIiniEo] 
that either abstract the application as a set of parallel queues or a network 
of queues, and online learning approaches such as reinformacement learning 
[211 [221123]. 

Proactive auto-scaling is desirable because time taken to start and conhg- 
ure newly started VMs creates a resource gap when workload suddenly surges 
to the level beyond capability of the available resources. To satisfy strict 
SLA, sometimes it is necessary to provision enough resources before work¬ 
load actually rises. As workloads of web applications usually reveal temporal 
patterns, accurate prediction of future workload is feasible using state-of-art 
time-series analysis and pattern recognition techniques. A lot of them have 
been applied to auto-scaling of web applications [16112111251 l26l l2ll 12111291 • 

Most auto-scaling systems only utilize homogeneous resources. While 
some, including our system, have explored using heterogeneous resources to 
provision web applications. Upendra et ah ED, and Srirama and Ostavar 
[32] adopt integer linear programming (ILP) to model the optimal heteroge¬ 
neous resource conhguration problem under SLA constraints. Fernandez et 
al. [23] utilizes tree paths to represent different combinations of heteroge¬ 
neous resources and then searches the tree to hnd the most suitable scaling 
plan according to user’s SLA requirements. 

Different from the above works, our objective goes beyond using minimum 
resources to provision the application. Instead, we want to devise fault- 
tolerant mechanism and auto-scaling policies that comply with the fault- 
tolerant semantics to reliably scale web applications on cheap spot instances. 
We believe the reviewed auto-scaling techniques are complementary to our 
approach. The proposed system can incorporate their resource estimation 
models, and workload prediction techniques as well. 


34 


1.2. Application of Spot Instances 

There have been a lot of attempts to use spot instances to cut resource 
cost under various application context. Resource provision problems using 
spot instances have been studied for fault-tolerant applications [311 ESI ESI 
EZlEHlEnilSnilllllSllISS] such as high performance computing, data analytics, 
MapReduce, and scentific workflow. 

For these applications, the fault-tolerant mechanism is often built on 
checkpointing, replication, and migration. Multiple novel checkpointing mech¬ 
anisms [m HU Ho] have been developed to allow these applications to harness 
the power of spot instances. SpotOn [H] combines multiple fault-tolerant 
mechanisms to increase the cost-efficiency and performance of batch process¬ 
ing applications running on spot instances. 

Regarding web applications, Han et ah 071 proposed a stochastic algo¬ 
rithm to plan future resource usage with a mixture of on-demand and spot 
instances. Except they only use homogeneous resources, their problem is 
also different to ours as they aim to plan the resource usage with the knowl¬ 
edge of the future while we provision resources dynamically. Mazzucco and 
Dumas jlH] also explored using a mixture of homogeneous on-demand in¬ 
stances and spot instances to provision web applications. Instead of building 
a reliable auto-scaling system, their target is to maximize web application 
provider’s profit by using an admission control mechanism at the front end 
to dynamically adapt to sudden changes of available resources. 

Sharma at al. proposed a derivative laaS cloud platform based on spot 
instances called SpotCheck [^150] . To transparently provide high availabil¬ 
ity on spot instances to end users, they incorporated technologies, such as 
nested virtualization, live VM migration, and time-bounded VM migration 
with memory checkpointing, to dynamically move users’ VMs when under¬ 
lying spot instances are available or revoked. Because of its transparency 
to end users, it is ideal for cloud brokers and large organizations with high 
resource demands. While our approach is lightweight and thus more suitable 
for small organizations who want to harness the power of spot instances by 
themselves. He et al. [SI] from the same group evaluated the ability of the 
approach to reliably run web applications on spot instances. Though they 
do not provision redundant capacity as we do, they reported non-negligible 
overhead incurred by nested virtulization. Their proposed system [l9l HHl El] 
is able to preserve the memory state of the revoked spot VMs, which enables 
it to seamlessly host stateful applications. Though our approach requires 
the application to be stateless, this does not reduce its generality as highly 
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scalable cloud applications are expected to be stateless m, and stateful ap¬ 
plications can be easily turned into stateless by storing session information in 
a memory cache cluster [10]. Their system relies on the termination warnings 
issued by existing providers [52] to be able to conduct migrations in time. 
Our approach is capable of operating in possible future spot markets that do 
not provide termination warnings. 

Recently, Amazon EC2 introduced a new feature, called Spot Fleet API 
[2] . It allows user to bid for a hxed amount of capacity possibly constituted by 
instances of different spot types. It continuously and automatically provisions 
the capacity using the combination of instances that incurs the lowest cost. 
However, as its provision decision ignores reliability, it is not suitable to 
provision web applications. 

8. Conclusions and Future Work 

In this paper, we explored how to reliably and cost-efficiently auto-scale 
web applications using a mixture of on-demand and heterogeneous spot 
instances. We hrst proposed a fault-tolerant mechanism that can handle 
unexpected spot terminations using heterogeneous spot instances and over¬ 
provision. We then devised novel cost-efficient auto-scaling policies that com¬ 
ply with the dehned fault-tolerant semantics for hourly-billed cloud market. 
We implemented a prototype of the proposed auto-scaling system on Amazon 
EC2 and a simulation version on CloudSim [3] for repeatable and fast valida¬ 
tion. We conducted both simulations and real experiments to demonstrate 
the efficacy of our approach by comparing the results with the benchmark 
approaches. 

In the future, we plan to further optimize our system by incorporating 
the following features: 

• selection of spot groups according to predicted spot prices in near fu¬ 
ture; 

• dynamic decision of fault-tolerant level and proportion of on-demand 
instances according to volatility of the spot market using machine lean¬ 
ing technologies; 

• an interface that allows web application providers to plug in different 
workload prediction techniques into the auto-scaling system to achieve 
proactive auto-scaling; and 
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• utilization of spot groups across different availability zones. 
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